Indexing Earth Mover's Distance over Network Metrics

نویسندگان

  • Ting Wang
  • Shicong Meng
  • Jiang Bian
چکیده

The Earth Mover’s Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for Lp feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose OASIS, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, OASIS employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real datasets, we confirmed that OASIS significantly outperforms the state-of-the-art methods in query processing cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Earth Mover's Distance and Equivalent Metrics for Spaces with Semigroups

introduce a multi-scale metric on a space equipped with a diffusion semigroup. We prove, under some technical conditions, that the norm dual to the space of Lipschitz functions with respect to this metric is equivalent to two other norms, one of which is a weighted sum of the averages at each scale, and one of which is a weighted sum of the difference of averages across scales. The notion of 's...

متن کامل

Earth Mover ’ s Distance and Equivalent Metrics for Spaces with Hierarchical Partition trees

define four metrics between probability measures on a space equipped with a hierarchical partition tree, and prove their equivalence. Similar metrics have previously been defined in more restrictive settings; in particular, the well-known Earth Mover's Distance is widely used in machine learning. We adapt the definitions of these metrics to a much broader class of geometries, and use machinery ...

متن کامل

Supervised Earth Mover's Distance Learning and Its Computer Vision Applications

Earth Mover’s Distance (EMD) is an intuitive and natural distance metric for comparing two histograms or probability distributions. We propose to jointly optimize the ground distance matrix and the EMD flow-network based on partial ordering of histogram distances in an optimization framework. Two applications in computer vision are used to demonstrate the effectiveness of the algorithm: firstly...

متن کامل

Sublinear Algorithms for Earth Mover ' s Distance

We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, A], with sample complexities independent of domain size permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other param...

متن کامل

Approximation Techniques for Indexing the Earth Mover's Distance in Multimedia Databases

Todays abundance of storage coupled with digital technologies in virtually any scientific or commercial application such as medical and biological imaging or music archives deal with tremendous quantities of images, videos or audio files stored in large multimedia databases. For content-based data mining and retrieval purposes suitable similarity models are crucial. The Earth Mover’s Distance w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2015